Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
translated by 谷歌翻译
在过去的25年中,我们目睹了机器学习在编译器领域的广泛应用。选择和相位订购问题。但是,有限的作品已在最先进的编译器(即LLVM)上游,以将前者无缝集成到编译器的优化管道中,以便由用户容易部署。 MLGO是此类项目的第一个项目之一,它仅努力使用强化学习使用基于ML的INLINER来减少二进制的代码大小。本文介绍了mlgoperf;第一个端到端框架,能够使用LLVM的ML Inliner优化性能。它采用二级ML模型来生成用于训练重新定位的增强学习代理的奖励,该辅助剂以前由MLGO用作主要模型。它通过预测分析功能的函数的速度加速来做到这一点,并为主要模型提供快速训练框架,否则将是不切实际的。实验结果表明,MLGOPERF在LLVM在O3时的优化方面的优化分别为SPEC CPU2006和CBENCH基准分别获得了1.8%和2.2%。此外,提出的方法为我们的基准测试带来了自动点守则区域的26%,可以将其转化为额外的3.7%速度值。
translated by 谷歌翻译
尽管电子健康记录是生物医学研究的丰富数据来源,但这些系统并未在医疗环境中统一地实施,并且由于医疗保健碎片化和孤立的电子健康记录之间缺乏互操作性,可能缺少大量数据。考虑到缺少数据的案例的删除可能会在随后的分析中引起严重的偏见,因此,一些作者更喜欢采用多重插补策略来恢复缺失的信息。不幸的是,尽管几项文献作品已经通过使用现在可以自由研究的任何不同的多个归档算法记录了有希望的结果,但尚无共识,MI算法效果最好。除了选择MI策略之外,归纳算法及其应用程序设置的选择也至关重要且具有挑战性。在本文中,受鲁宾和范布伦的开创性作品的启发,我们提出了一个方法学框架,可以应用于评估和比较多种多个插补技术,旨在选择用于计算临床研究工作中最有效的推断。我们的框架已被应用于验证和扩展较大的队列,这是我们在先前的文献研究中提出的结果,我们在其中评估了关键患者的描述符和Covid-19的影响在2型糖尿病患者中的影响,其数据为2型糖尿病,其数据为2型糖尿病由国家共同队列合作飞地提供。
translated by 谷歌翻译
有效的探索仍然是一个重要的挑战,这可以防止为许多物理系统部署加强学习。对于具有连续和高维状态和动作空间的系统尤其如此,例如机器人操纵器。挑战在稀疏奖励环境中强调,其中设计密集奖励设计所需的低级状态信息不可用。对手仿制学习(AIL)可以通过利用专家生成的最佳行为和基本上提供替代奖励信息的替代来部分克服这一屏障。不幸的是,专家示范的可用性并不一定能够改善代理商有效探索的能力,并且正如我们经常展现所在,可以导致效率低或停滞不前。我们从引导播放(LFGP)中展示了一个框架,其中我们利用了专家演示,除了主要任务,多个辅助任务。随后,使用修改的AIL过程来使用分层模型来学习每个任务奖励和策略,其中通过组合不同任务的调度程序强制对所有任务的探索。这提供了许多好处:具有挑战瓶颈转换的主要任务的学习效率得到改善,专家数据在任务之间可重复使用,并且通过重用学习辅助任务模型的传输学习成为可能。我们在一个具有挑战性的多任务机器人操纵域中的实验结果表明我们的方法有利地对监督模仿学习和最先进的AIL方法进行比较。代码可在https://github.com/utiasstars/lfgp获得。
translated by 谷歌翻译
在许多领域,包括强化学习和控制在内的许多领域,从一系列高维观测中学习或识别动力学是一个困难的挑战。最近通过潜在动力学从生成的角度研究了这个问题:将高维观测结果嵌入到较低维的空间中,可以在其中学习动力学。尽管取得了一些成功,但尚未将潜在动力学模型应用于现实世界的机器人系统,在这些机器人系统中,学习的表示形式必须适合各种感知混杂和噪声源。在本文中,我们提出了一种共同学习潜在状态表示的方法以及在感知困难条件下的长期计划和闭环控制的相关动力。作为我们的主要贡献,我们描述了我们的表示如何能够通过检测新颖或分布(OOD)输入来捕获测试时间的异质或输入特异性不确定性的概念。我们介绍了有关两个基于图像的任务的预测和控制实验的结果:一个模拟的摆平衡任务和实现任务的现实世界机器人操纵器。我们证明,与仅在不同程度的输入降解的情况下,我们的模型可产生更准确的预测,并表现出改善的控制性能。
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
Location-aware networks will introduce new services and applications for modern convenience, surveillance, and public safety. In this paper, we consider the problem of cooperative localization in a wireless network where the position of certain anchor nodes can be controlled. We introduce an active planning method that aims at moving the anchors such that the information gain of future measurements is maximized. In the control layer of the proposed method, control inputs are calculated by minimizing the traces of approximate inverse Bayesian Fisher information matrixes (FIMs). The estimation layer computes estimates of the agent states and provides Gaussian representations of marginal posteriors of agent positions to the control layer for approximate Bayesian FIM computations. Based on a cost function that accumulates Bayesian FIM contributions over a sliding window of discrete future timesteps, a receding horizon (RH) control is performed. Approximations that make it possible to solve the resulting tree-search problem efficiently are also discussed. A numerical case study demonstrates the intelligent behavior of a single controlled anchor in a 3-D scenario and the resulting significantly improved localization accuracy.
translated by 谷歌翻译
In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedge-dependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.
translated by 谷歌翻译
Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.
translated by 谷歌翻译